Large numbers of explanatory variables, a semi-descriptive analysis.

نویسندگان

  • D R Cox
  • H S Battey
چکیده

Data with a relatively small number of study individuals and a very large number of potential explanatory features arise particularly, but by no means only, in genomics. A powerful method of analysis, the lasso [Tibshirani R (1996) J Roy Stat Soc B 58:267-288], takes account of an assumed sparsity of effects, that is, that most of the features are nugatory. Standard criteria for model fitting, such as the method of least squares, are modified by imposing a penalty for each explanatory variable used. There results a single model, leaving open the possibility that other sparse choices of explanatory features fit virtually equally well. The method suggested in this paper aims to specify simple models that are essentially equally effective, leaving detailed interpretation to the specifics of the particular study. The method hinges on the ability to make initially a very large number of separate analyses, allowing each explanatory feature to be assessed in combination with many other such features. Further stages allow the assessment of more complex patterns such as nonlinear and interactive dependences. The method has formal similarities to so-called partially balanced incomplete block designs introduced 80 years ago [Yates F (1936) J Agric Sci 26:424-455] for the study of large-scale plant breeding trials. The emphasis in this paper is strongly on exploratory analysis; the more formal statistical properties obtained under idealized assumptions will be reported separately.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MARCINKIEWICZ-TYPE STRONG LAW OF LARGE NUMBERS FOR DOUBLE ARRAYS OF NEGATIVELY DEPENDENT RANDOM VARIABLES

In the following work we present a proof for the strong law of large numbers for pairwise negatively dependent random variables which relaxes the usual assumption of pairwise independence. Let be a double sequence of pairwise negatively dependent random variables. If for all non-negative real numbers t and , for 1 < p < 2, then we prove that (1). In addition, it also converges to 0 in ....

متن کامل

Laws of Large Numbers for Random Linear

The computational solution of large scale linear programming problems contains various difficulties. One of the difficulties is to ensure numerical stability. There is another difficulty of a different nature, namely the original data, contains errors as well. In this paper, we show that the effect of the random errors in the original data has a diminishing tendency for the optimal value as the...

متن کامل

ON THE LAWS OF LARGE NUMBERS FOR DEPENDENT RANDOM VARIABLES

In this paper, we extend and generalize some recent results on the strong laws of large numbers (SLLN) for pairwise independent random variables [3]. No assumption is made concerning the existence of independence among the random variables (henceforth r.v.’s). Also Chandra’s result on Cesàro uniformly integrable r.v.’s is extended.

متن کامل

A Note on the Strong Law of Large Numbers

Petrov (1996) proved the connection between general moment conditions and the applicability of the strong law of large numbers to a sequence of pairwise independent and identically distributed random variables. This note examines this connection to a sequence of pairwise negative quadrant dependent (NQD) and identically distributed random variables. As a consequence of the main theorem ...

متن کامل

SOME PROBABILISTIC INEQUALITIES FOR FUZZY RANDOM VARIABLES

In this paper, the concepts of positive dependence and linearlypositive quadrant dependence are introduced for fuzzy random variables. Also,an inequality is obtained for partial sums of linearly positive quadrant depen-dent fuzzy random variables. Moreover, a weak law of large numbers is estab-lished for linearly positive quadrant dependent fuzzy random variables. Weextend some well known inequ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 114 32  شماره 

صفحات  -

تاریخ انتشار 2017